Fault Tolerant Scheduling in Distributed Networks
نویسندگان
چکیده
We present a model for application-level fault tolerance for parallel applications. The objective is to achieve high reliability with minimal impact on the application. Our approach is based on a full replication of all parallel application components in a distributed wide-area environment in which each replica is independently scheduled in a different site. A system architecture for coordinating the replicas is described. The fault tolerance mechanism is being added to a wide-area scheduler prototype in the Legion parallel processing system. A performance evaluation of the fault tolerant scheduler and a comparison to the traditional means of fault tolerance, checkpoint-recovery, is planned.1
منابع مشابه
Multihybrid job scheduling for fault-tolerant distributed computing in policy-constrained resource networks
متن کامل
Real-time Fault-tolerant Scheduling Algorithm for Distributed Computing Systems
This article proposes a Distributed Realtime Fault-tolerant model, priority Real-time Fault-tolerant algorithm and computational architecture of Distributed Real-time Fault-tolerant. According to this model, the problem of how to schedule a weighted Directed Acyclic Graph (DAG) in Distributed computing system for high reliability can be solved in the presence of multiprocessors faults. When som...
متن کاملReal-time Fault-tolerant Scheduling in Heterogeneous Distributed Systems
∗ This work was supported by National Defense Pre-research Foundation of China. Abstract: Some works have been done in addressing real-time fault-tolerant scheduling algorithms. However, they all based on homogeneous distributed systems or multiprocessor systems, which have identical processors. This paper presents two fault-tolerant scheduling algorithms, RTFTNO and RTFTRC, for periodic real-t...
متن کاملA New Proactive Fault Tolerant Approach for Scheduling in Computational Grid
Grid Computing provides non-trivial services to users and aggregates the power of widely distributed resources. Computational grids solve large scale scientific problems using distributed heterogeneous resources. The Grid Scheduler must select proper resources for executing the tasks with less response time and without missing the deadline. There are various reasons such as network failure, ove...
متن کاملAn Efficient Fault Tolerant Scheduling Approach for Computational Grid
Grid computing serves as an important technology to facilitate distributed computation computational grids solve large scale scientific problems using heterogeneous geographically distributed resources. Problems like dispatching and scheduling of tasks are considered as major issues in computational grid environment. The Grid Scheduler must select proper resources for executing the tasks with l...
متن کامل